-
Notifications
You must be signed in to change notification settings - Fork 107
feat(datasets): make table write mode configurable #1093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(datasets): make table write mode configurable #1093
Conversation
…ble materialization - feat: Introduces a `mode` parameter for save operations, allowing "append", "overwrite", "error", and "ignore" options to control write behavior. - Supports legacy `overwrite` flag with backward compatibility and deprecation warning. - Adds mode dispatching logic to handle different write scenarios. - Updates examples and docs to reflect new parameter. - Prevents simultaneous use of both `mode` and legacy `overwrite` to avoid ambiguity. - docs: Improves documentation and usage examples for new save modes. Semi-breaking change: Replaces `overwrite` save argument with `mode`; users should update configurations to use `mode`. Signed-off-by: gitgud5000 <[email protected]>
2e9e539
to
a41bb19
Compare
Signed-off-by: gitgud5000 <[email protected]>
a41bb19
to
1f609c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
Did a quick-pass review. I see you're still actively updating, so I'll revisit this later.
|
||
self._materialized = self._save_args.pop("materialized") | ||
|
||
# Handle mode / overwrite conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if insert + view needs to be disallowed; I haven't actually checked yet.
I've been activily working with this because I need the features now 😅. I've also added support for passing credentials to the |
ibis.TableDataset
add configurable save mode for table materializationibis.TableDataset
add configurable save mode for table materialization and credentials
support
…precate connection - feat: introduce a new `credentials` parameter to accept connection info as a string or dict, preferred over the deprecated `connection` param - warn if both `credentials` and `connection` are provided, prioritizing `credentials` - support connection strings and dicts with connection strings for backend connections - feat: add backend name extraction to be used in `_describe()` method - docs: update docstrings to explain `credentials` and deprecation of `connection` !Semi- Breaking Change: deprecates the `connection` parameter in favor of `credentials` Signed-off-by: gitgud5000 <[email protected]>
…lacing the function dispatch dict with direct conditional handling, improving clarity and maintainability - docs: expands and clarifies docstring to explain available save modes and their effects, referencing Spark semantics for familiarity Signed-off-by: gitgud5000 <[email protected]>
Signed-off-by: gitgud5000 <[email protected]>
d545871
to
6445239
Compare
Best reason. 😁
Sorry, I didn't check in time; ideal is a separate PR, but it's probably no big deal to have it in the same. If it ends up being a blocker to get something merged, can always split it out later. |
- fix: add early return for empty DataFrame - fix: ensure table creation occurs before insert operations in append mode when table doesn't exist Signed-off-by: gitgud5000 <[email protected]>
40bbfe3
to
6130581
Compare
Could you please take a look at this pull request, @ankatiyar @ravi-kumar-pilla? |
@gitgud5000 I'll take a look, in the meanwhile could you add some unit tests to go along with the changes? :D |
- fix: remove lingering 'mode' key when mapping legacy overwrite to prevent unexpected writer kwargs - fix: treat empty pandas DataFrame as a no-op in save, supporting both pandas and ibis tables Prevents accidental parameter leakage and avoids errors when saving empty data. Signed-off-by: gitgud5000 <[email protected]>
…and legacy overwrite behavior Signed-off-by: gitgud5000 <[email protected]>
db200f3
to
55b8666
Compare
… new save modes and credentials parameter Signed-off-by: gitgud5000 <[email protected]>
Signed-off-by: gitgud5000 <[email protected]>
Signed-off-by: gitgud5000 <[email protected]>
@ankatiyar Done ✅ I wasn’t able to resolve a couple of things:
=========================== short test summary info ============================
FAILED kedro-datasets/tests/ibis/test_table_dataset.py::TestTableDataset::test_connection_config[None-None-None-connection_config1-key1]
|
Signed-off-by: Ankita Katiyar <[email protected]>
63b95c1
to
4c6ddb9
Compare
f819670
to
cda534a
Compare
Signed-off-by: Deepyaman Datta <[email protected]>
1057b14
to
513aab0
Compare
ibis.TableDataset
add configurable save mode for table materialization and credentials
supportSigned-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
Signed-off-by: Deepyaman Datta <[email protected]>
62d976a
to
55ba8a0
Compare
The code looks good to me! I'm wondering if we need to support |
yes, the next release will likely be a major one (given the new GenAI datasets), so doing the breaking change now could make sense. That said, I’m not sure how widely the Ibis dataset is used or whether existing users would expect backward compatibility / a deprecation phase before overwrite is removed. If usage is low, it might be fine to go straight to mode and make it part of the next major. |
I believe there are a fair number of users, and I'd rather keep the deprecation warning given that (1) it's already implemented and (2) it's not some complicated functionality to maintain, and there are no significant tradeoffs/downsides to having the functionality.
What is the timeline for the next release? @gitgud5000 also implemented credentials support as part of this PR that we were split out/are moving to a follow up. I assume that would also be good to get in for the next release. |
Signed-off-by: Deepyaman Datta <[email protected]>
586c24c
to
6912ec6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @gitgud5000 and @deepyaman 💯
…de argument (PR kedro-org#1093) and credentials support in the upcoming changes section. Signed-off-by: gitgud5000 <[email protected]>
TODO(deepyaman): Move notes on
credentials
support to a new PR from @gitgud5000 with those changes, once it exists. I'm not deleting the detailed notes from here until they're moved elsewhere.feat(datasets): ibis.TableDataset add
mode
andcredentials
supportmode
parameter for save operations, allowing "append", "overwrite", "error", and "ignore" options to control write behavior.overwrite
flag with backward compatibility and deprecation warning.mode
and legacyoverwrite
to avoid ambiguity.credentials
parameter to accept connection info (dict or string URI), supersedingconnection
.credentials
and deprecatedconnection
are provided._describe()
to use it.Semi-breaking change:
overwrite
andconnection
are deprecated; users should migrate tomode
andcredentials
.Description
This PR introduces two key enhancements to
ibis.TableDataset
:Configurable Save Modes:
mode
parameter tosave_args
similar to Spark’sDataFrameWriter.mode
and Pandas’to_csv(mode=...)
, with support for:"append"
: Insert data into an existing table (requires the backend to implementinsert()
)."overwrite"
: Drop and recreate the table/view."error"
or"errorifexists"
: Fail if the table already exists."ignore"
: Do nothing if the table exists; otherwise, create it.overwrite=True|False
maps tomode="overwrite"|"error"
.mode
andoverwrite
are specified simultaneously.credentials
Parameter:credentials
as the preferred method for specifying Ibis backend connection configurations.con
string).connection
parameter._connect()
and_describe()
to support new functionality.Development notes
mode
argument).DataFrameWriter.mode
and Pandas’sto_csv(mode=…)
.overwrite
behavior.ibis.connect()
Expand for details:
Save Mode Handling
DEFAULT_SAVE_ARGS
updated to usemode="overwrite"
by default.__init__
:mode
/overwrite
presence.overwrite
→mode
internally.save()
dispatches based on mode:append
→ callsinsert()
if supported, else raisesNotImplementedError
.overwrite
,error
,ignore
→ handled viacreate_table
/create_view
with appropriate overwrite flag._describe()
includesmode
.Credentials Support
credentials
param to init signature and docstring.con
string._connect()
routes accordingly based on type._get_backend_name()
extracts backend info for_describe()
.All supported
mode
options were tested using both DuckDB and Postgres backends.credentials
parsing was tested for string, dict-with-con, and dict-with-backend formats.Checklist
jsonschema/kedro-catalog-X.XX.json
if necessaryRELEASE.md
file